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. Abstract 

C ' We give new bounds on the circuit complexity of the quantum Fourier transform (QFT). 

1^ I We give an upper bound of 0(logn + loglog(l/£)) on the circuit depth for computing an ap- 

proximation of the QFT with respect to the modulus 2" with error bounded by e. Thus, 
I even for exponentially small error, our circuits have depth O(logrt). The best previous depth 

■ bound was 0{n), even for approximations with constant error. Moreover, our circuits have size 
^ I 0{nlog{n/e)). We also give an upper bound of 0(n(logn)^ loglogn) on the circuit size of the 

■ exact QFT modulo 2", for which the best previous bound was 0{-n?). 
As an application of the above depth bound, we show that Shor's factoring algorithm may 

be based on quantum circuits with depth only 0(log7i) and polynomial-size, in combination 
\ with classical polynomial-time pre- and post-processing. In the language of computational 

■ complexity, this implies that factoring is in the complexity class ZPP'^'^^'^, where BQNC is the 
\ class of problems computable with bounded-error probability by quantum circuits with poly- 

■ logarithmic depth and polynomial size. 

^ il Finally, we prove an f2(logn) lower bound on the depth complexity of approximations of the 

I ' > I QFT with constant error. This implies that the above upper bound is asymptotically optimal 

C ' (for a reasonable range of values of e). 

^ : 

JTi 1 Introduction and summary of results 
• I— I . 

^ , In this paper we consider the quantum circuit complexity of the quantum Fourier transform ( QFT). 

^ [ The quantum Fourier transform is the key quantum operation at the heart of Shor's quantum 

■ - - ' algorithms for factoring and computing discrete logarithms [^] and the known extensions and 



o 
o 



variants of these algorithms (see, e.g., Kitaev |2^], Boneh and Lipton 10], Grigoriev [20|, and Cleve, 
Ekert, Macchiavello, and Mosca fl^). The quantum Fourier transform also plays a key role in 
extensions of Grover's quantum searching technique |21], due to Brassard, H0yer, and Tapp M and 



Mosca [|9| 



In order to discuss the quantum Fourier transform in greater detail we recall the discrete Fourier 
transform (DFT); for a given dimension m the discrete Fourier transform is a linear operator on 
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mapping (ao,ai,... ,am-i) to {bo,bi,... where 



^1 



The discrete Fourier transform has many important apphcations in classical computing, essentially 
due to the efficiency of the fast Fourier transform (FFT), which is an algorithm that computes 
the DFT with O(mlogm) arithmetic operations, as opposed to the obvious 0{m?) method. The 



FFT algorithm was proposed by Cooley and Tukey in 1965 |13|, though its origins can be traced 
back to Gauss in 1866 fl% . The FFT plays an important role in digital signal processing, and 
it has been suggested [^] as a contender for the second most important nontrivial algorithm in 
practice, after fast sorting. The DFT (and the FFT algorithm) generalize to certain algebraic 
structures, such as rings containing primitive m}^ roots of unity (which can play the role of e^'^*/'" 
in Eq. |l]). This more abstract type of FFT is a principal component in Schonhage and Strassen's 
fast multiplication algorithm [^], which can be expressed as circuits of size 0(nlog?7-loglogn) 
for multiplying n-bit integers. For more applications — of which there are many — and historical 
information, see [27, ll^, E^. 



The quantum Fourier transform ( QFT) is a unitary operation that essentially performs the 
DFT on the amplitude vector of a quantum state — the QFT maps the quantum state X^aS/ (^x\x) 
to the state Y1^=q Px\x)^ where 

m— 1 

For certain values of m there are very efficient quantum algorithms for the QFT. The fact that 
the quantum circuit size can be polynomial in logm for some values of m was first observed by 



Shor [34 1 and is of critical importance in his polynomial-time algorithms for prime factorization and 



discrete logarithms. Shor's original method may be described as a "mixed-radix" method, and is 



discussed further in Section 7.2. In the particular case where m = 2", there exist quantum circuits 
performing the quantum Fourier transform with O(n^) gates, which was proved by Coppersmith 
[0 (see also [^ ) . These circuits are based on a recursive description of the QFT that is analogous 
to the description of the DFT exploited by the FFT. While in some sense these quantum circuits 
are exponentially faster than the classical FFT, the task that they perform is quite different. The 
QFT does not explicitly produce any of the values /3o, /3i , . . . , /3m- 1 as output (nor does it explicitly 
obtain any of the values ao, ai, . . . , ctm-i as input). Intuitively, the difference between performing 
a DFT and a QFT can be thought of as being analogous to the difference between computing all the 
probabilities that comprise a probability distribution and sampling a probability distribution — the 
latter task being frequently much easier. 



Coppersmith [14[ also proposed quantum circuits that approximate the QFT with error bounded 
by e, and showed that such approximations can be computed by circuits of size 0(nlog(n/e)) for 
modulus 2". Such approximations can be thought of as unitary operations whose distance from the 
QFT (in the operator norm induced by Euclidean distance) is bounded by e. Kitaev [25[ showed 
how the QFT for an arbitrary modulus m can be approximated by circuits with size polynomial 
in log(m/e). For most information processing purposes, it suffices to use such approximations of 
quantum operations (for e ranging from constant down to l/n*^^^^). Indeed, since it seems rather 
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implausible to physically implement quantum gates with perfect accuracy, the need to ultimately 
consider approximations is likely inevitable. Thus, we believe that the most relevant consideration 
is to approximately compute the QFT, though exact computations of the QFT are still of interest 
as part of the mathematical theory of quantum computation. 

Moore and Nilsson showed how to obtain logarithmic-depth circuits that perform encoding 
and decoding for standard quantum error-correcting codes. For the QFT, in both the exact and 
approximate case, the gates in Coppersmith's circuits can be arranged so as to have depth 2n — 1, 



as noted in [28|, but not less depth than this. Similarly, the techniques of Shor and of Kitaev have 
polynomial depth. Our first result shows that it is possible to compute good approximations of the 
QFT with logarithmic-depth quantum circuits. 

Theorem 1 For any n and e there is a quantum circuit approximating the QFT modulo 2" with 
precision e that has size 0{n\og{n/ e)) and depth 0(logn + log log(l/e)). 

By an approximation of a unitary operation U with precision e, we mean a unitary operation V 
(possibly acting on additional ancilla qubits) with the following property. For any input (pure) 
quantum state, the Euclidean distance between applying U to the state and V to the state is at 
most e (in the Hilbert space that includes the input/output qubits and the ancilla qubits). Also, 
whenever we refer to circuits, unless otherwise stated, there is an implicit assumption that the 
circuits belong to a logarithmic-space uniformly generated family in the usual way (via a classical 
Turing machine). In Section |7.2| , we consider a different approach for parallelizing Shor's QFT 
method, which gives somewhat worse bounds. 

The proof of Theorem 1 follows the general approach introduced by Kitaev |25|, with several 



efficiency improvements as well as parallelizations. In particular, we introduce a new parallel 
method for performing multiprecision phase estimation. 

We also show that, if size rather than depth is the primary consideration, it is possible to 
compute the QFT exactly with a near-linear number of gates. 

Theorem 2 For any n there is a quantum circuit that exactly computes the QFT modulo 2" that 
has size 0(n(logn)^ loglogn) and depth 0{n). 

Theorem 2 is based on a nonstandard recursive description of the QFT combined with an 
asymptotically fast multiplication algorithm |33|. 

There are several reasons why we believe results regarding quantum circuit complexity, such as 
in the above theorems, are important. First, circuit depth is likely to be particularly relevant in the 
quantum setting for physical reasons. Perhaps most notably, fault tolerant quantum computation 
necessarily requires parallelization anyway ^ — under various noise models, error correction must 
continually be applied in parallel to the qubits of a quantum computer, even when the qubits 
are doing nothing. In such models, parallelization saves not only the total amount of time, but 
also the total amount of work. Furthermore, informally speaking, the depth of a quantum circuit 
corresponds to the amount of time coherence must be preserved, so in addition to saving work, 
parallelization may allow for larger quantum circuits to be implemented within systems having 
shorter decoherence times or using less extensive error correction. A final reason is that such 
results suggest alternate methods for performing various operations, which may in turn suggest 
or shed light on quantum algorithms for other problems or more general methods for improving 
efficiency of quantum algorithms. 
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It has long been known that the main bottleneck of the quantum portion of Shor's factoring 
algorithm is not the QFT, but rather is the modular exponentiation step. If it were possible to per- 
form modular exponentiation by classical circuits with poly-logarithmic depth and polynomial size 
then it would be possible to implement Shor's factoring algorithm in poly-logarithmic depth with 
a polynomial number of qubits. Although no such algorithm is known for modular exponentiation, 
we can prove the following weaker result, which nevertheless implies that quantum computers need 
only run for poly-logarithmic time for factoring to be feasible. 

Theorem 3 There is an algorithm for factoring n-bit integers that consists of: a classical pre- 
processing stage, computed hy a polynomial-size classical circuit; followed by a quantum informa- 
tion processing stage, computed by an 0{logn)-depth O {n^ (log n)'^) -size quantum circuii]^; followed 
by a classical post-processing stage, computed by a polynomial-size classical circuit. Furthermore, 
the size of the quantum circuit can be reduced if a larger depth is allowed. In particular, the size 
can be reduced to O(n^) if the depth is increased to 0((logn)^). 

If we define the complexity class BQNC as all computational problems that can be solved by 
quantum circuits with poly-logarithmic depth and polynomial size — a reasonably natural extension 
of previous notation (see, e.g., |2^) — then Theorem ^ implies that the factoring problem is in 

2ppBQNC 

Finally, we consider the minimum depth required for approximating the QFT. It is fairly easy 
to show that computing the QFT exactly requires depth at least logn. However, this is less clear in 
the case of approximations — and we exhibit a problem related to the QFT whose depth complexity 
decreases from logn in the exact case to O(loglogn) for approximations with precision £, whenever 
e E l/n^^^\ Nevertheless, we show the following. 

Theorem 4 Any quantum circuit consisting of one- and two-qubit gates that approximates the 
QFT with precision or smaller must have depth at least logn. 

This implies that the depth upper bound in Theorem |^ is asymptotically optimal for a reasonable 
range of values of e. 

The remainder of this paper is organized as follows. In Section ^, we review some definitions 
and introduce notation that is used in subsequent sections. In Section ^ we prove the depth and 
size bounds for quantum circuits approximating the quantum Fourier transform for any power- 
of-2 modulus as claimed in Theorem |^, and in Section ^ we prove the size bound claimed in 
Theorem |2| for exactly computing the quantum Fourier transform. In Section ^ we prove Theorem ^ 
by demonstrating how Shor's factoring algorithm can be arranged so as to require only logarithmic- 
depth quantum circuits. In Section ^ we prove the lower bound for the QFT in Theorem ^ In 
Section |^ we discuss the situation when the modulus for the quantum Fourier transform is not 
necessarily a power of 2, including arbitrary moduli and the special case of "smooth" moduli 
considered in Shor's original method for performing quantum Fourier transform. We conclude with 
Section |8|, which mentions some directions for future work relating to this paper. 



In this case, the underlying circuit family is polynomial-time uniform rather than logarithmic-space uniform. 
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2 Definitions and notation 



Notation for special quantum states: For an n-bit modulus 77i, W6 will identify each x G 
with its binary representation Xn-i ■ ■ ■ xiXq G {0, 1}". For x G Zm, the state \x) = \xn-i ■ ■ ■ xiXq) 
is called a computational basis state. 
For X G Zm, the state 

m— 1 

y=o 

is a Fourier basis state with phase parameter x. As noted in [0], when m = 2", iV'x) can be 
factored as follows 

|V'x„_,...xixo) = + e2-(0-o)|l))(|0) + e2-(o-i-o)|i)) . . . (|o) + e2.i(o.x„_,...xi.o)|i)). (4) 

For convenience, we define the state 

lM.) = ^(|0) + e2-^|l)), (5) 
where is a real parameter. Using this notation, we can rewrite Eq. § as 

.xixo) — 1/^0.2:0/1/^0.2:1x0/ ■ ■ ■ |A''0.a:„_i...a;ia;o/- (6) 

Definition of the QFT: The quantum Fourier transform ( QFT) is the unitary operation that 
maps \x) to (for all x G Zm)- 

Mappings related to the QFT: A quantum Fourier state computation ( QFS) is any unitary 
operation that maps |x)|0) to \x)\'ipx) (for all x G I'm)- When the input is a computational basis 
state, this computes the corresponding Fourier state, but without erasing the input. We refer to 
approximations of a QFS as Fourier state estimation. A quantum Fourier phase computation ( QFP) 
is any unitary operation that maps |?/'x)|0) to \iIjx)\x) (for all x G I'm)- When the input is a Fourier 
basis state, this computes the corresponding phase parameter, but without erasing the input. We 
refer to approximations of a QFP as Fourier phase estimation. As pointed out by Kitaev |2^, the 
QFT can be computed by composing a QFS and the inverse of a QFP: |a;)|0) 1— > "-^ |0)lv^x)- 

Quantum gates: All of the quantum circuits that we construct will be composed of three types 
of unitary gates. One is the one-qubit Hadamard gate, H, which maps \x) to -^(lO) + (— 1)^|1)) 
(for X G {0, 1}). Another is the one-qubit phase shift gate, P{9), where is a parameter of the 
form x/2" (for x G P{9) maps \x) to e^'^*^^|x) (for x G {0,1}). Finally, we use two-qubit 

controlled-phase shift gates, controlled-P(0) {c-P{9) for short), which map to e^'^*^^^|a;)|y) 

(for x,y G {0, 1}). Note that this set is universal, and in particular that any (classical) reversible 
circuit can be composed of these gates. 



3 New depth bounds for the QFT 

The main purpose of this section is to prove Theorem ||. 

First, we review the approach of Kitaev |25] for performing the QFT for an arbitrary modulus 



m. By linearity, it is sufficient to give a circuit that operates correctly on computational basis states. 
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Given a computational basis state \x), first create the Fourier basis state with phase parameter 
X (which can be done easily if |x) is not erased in the process). The system is now in the state 
\x)\ipx)- Now, by performing Fourier phase estimation, the state \x)\^lJx) can be approximated from 
the state |O)|'0a;)- Therefore, by performing the inverse of Fourier phase estimation on the state 
\x)\^px), a good estimate of the state |O)|'0a;) is obtained. 

The particular phase estimation procedure used by Kitaev does not readily parallelize, but, in 
the case where the modulus is a power of 2, we give a new phase estimation procedure that does 
parallelize. This procedure requires several copies of the Fourier basis state rather than just one. 
To insure that the entire process parallelizes, we must parallelize the creation of the Fourier basis 
state as well as the process of copying and uncopying this state. 

The basic steps of our technique are as follows: 

1. Creation of the Fourier basis state, which is the mapping 

|x)|0) ^ \x)\iJx). 

2. Copying the Fourier basis state, which is the mapping 

|V',.)|0)---|0) ^ \^l^x)\i^x)■■■\^Px). 

3. Erasing the computational basis state by means of estimating the phase of the Fourier basis 
state, which is the mapping 

\x)\lpx)\'(px) ■ ■ ■ IV'x) \0)\tpx)\'4'x) ■ ■ ■ llpx)- 

4. Reverse step 2, which is the mapping 

IV'x)IV'x)---|V'x) ^ |^:.)|o)---|o). 

Each of these components is discussed in detail in the subsections that follow. Throughout we 
assume the modulus is m = 2"". 

3.1 Parallel Fourier state computation and estimation 

The first step is the creation of the Fourier basis state corresponding to a given computational basis 
state \x). This corresponds to the mapping 

\x)\0)^\x)\i^x). (7) 

First let us consider a circuit that performs this transformation exactly. By Eq. ^ (equivalently, 
Eq. I), it suffices to compute the states |/xo.xo)> IfJ'O.xixo), ■■■ , |/"o.x'„_i...xixo) individually. 

The circuit suggested by Figure || performs the required transformation for \no.Xj...xo) ■ this 
figure we have not labelled the controlled phase shift gates, c-P{9) (such gates are defined in 
Section |2|), which are the gates in the center drawn as two solid circles connected by a line. In the 
above case, the phase 6 depends on j and on the particular qubit of \xn-i ■ ■ ■ xiXq) on which the 
gate acts. The value of 9 for the controlled phase shift acting on \xi) is 2*"-'^^ (for i £ {0, 1, . . . , j}). 
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Figure 1: Quantum circuit for the exact preparation of If^o.Xj--- xo) ■ 

From this, it may be verified that the circuit acts as indicated. The depth of this circuit is O(logn) 
and the size is 0{n). 

If such a circuit is to be apphed for each value of j G {0, 1, . . . , n — 1}, in order to perform the 
mapping (^), then the qubits \xn-i), ■ ■ ■ \xi), \xo) must first be copied several times {n — i times for 
\xi)) to allow the controlled phase shift gates to operate in parallel. This may be performed (and 
inverted appropriately) in size 0{n^) and depth 0(log n) in the most obvious way. We conclude 
that the transformation (0) can be performed by circuits of size 0{n^) and depth 0(log n) in the 
exact case. 

In order to reduce the size of the circuit in the approximate case, we use a similar procedure, 
except we only perform the controlled phase shifts when the phase 9 is significant. An illustration 
of such a circuit is given in Figure |2[ Here k denotes the number of significant phase shift gates that 
are used. The condition \\\fJ.o.xj-xo) - \^J'0.xj■■■Xj_k+l)\\ ^ (e/n)^'^'^^ occurs when k G 0(log(n/e)). 
With such a setting of k, the precision of the approximation of |/^o.x„_i---a;o) " " " l/^o.^o) can be 0{e). 
Note that the size of the resulting circuit is 0(nlog(n/e)) and the depth is 0(log log(n/e)). 

3.2 Copying a Fourier state in parallel 

In this section, we show how to efficiently produce k copies of an n-qubit Fourier state from one 
copy. This is a unitary operation that acts on k n-qubit registers (thus kn qubits in all) and maps 
|V'x)|0") • • • |0") to \ipx)\ipx) ■ ■ ■ \ipx) for all X G {0, 1}". The copying circuit will be exact and have 
size 0{kn) and depth 0(log(A;n)). The setting of k will be 0(log(n/e)). 

Let us begin by considering the problem of producing two copies of a Fourier state from one. 
First, define the (reversible) addition and (reversible) subtraction operations as the mappings 

\x)\y) ^ \x)\y + x) 
\x)\y) ^ \x)\y-x) 
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Fi gure 2: Quantum circuit for the approximate preparation of |/^o.xw...xo)- 

(respectively), where x, y G {0, 1}"" and additions and subtractions are performed as integers modulo 
2". By appealing to classical results about the complexity of arithmetic |3^, one can construct 
quantum circuits of size 0{n) and depth O(logn) for these operations (using an ancilla of size 
0(n)). 

It is straightforward to show that applying a subtraction to the state results in the 

state |V'2;+j/)|V'y)- Also, the state IV'o) can be obtained from |0") by applying a Hadamard transform 
independently to each qubit. Therefore, the copying operation can begin with a state of the form 
|0")|'i/'x) and consist of these two steps: 

1. Apply K to each of the first n qubits. 

2. Apply the subtraction operation to the 2n qubits. 

The resulting state will be 

An obvious method for computing k copies of a Fourier state is to repeatedly apply the above 
doubling operation. This will result in a quantum circuit of size 0{kn); however, its depth will be 
0((log /c)(log n)), which is too large for our purposes. 

The depth bound can be improved to 0(log(A;n)) by applying other classical circuit constructions 
to efhciently implement the (reversible) prefix addition and (reversible) telescoping subtraction 
operations, which are the mappings 

\X1)\X2) ■ ■ ■ \Xk) ^ \xi)\xi + X2) ■ ■ -{Xl + X2-\ \-Xk) 

\XI)\X2) ■ ■ ■ \Xk) ^ \X1)\X2 - Xl) ■ ■ -{Xk - Xk-l) 

(respectively), where xi,X2, ... € {0, 1}". Before addressing the issue of efficiently implement- 
ing these operations, let us note that the copying operation can be performed by starting with the 
state |0"') • • • |0")|V'x) and performing these two steps: 

1. Apply H to all of the first {k — l)n qubits. 

2. Apply the telescoping subtraction operation to the kn qubits. 
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The resulting state will he \tjjx) ■ ■ ■ \ipx) ■ 

Now, to implement the prefix addition and telescoping subtraction, note that they are inverses 
of each other. This means that it is sufficient to implement each one efficiently by a classical (non- 
reversible) circuit, and then combine these to produce a reversible circuit by standard techniques 
in reversible computing [|5|. The telescoping subtraction clearly consists of A; — 1 subtractions that 
can be performed in parallel, so the nonreversible size and depth bounds are 0{kn) and O(logn) 
respectively. 

The prefix addition is a little more complicated. It relies on a combination of well-known tools 
in classical circuit design. One of them is the following general result of Ladner and Fischer 
about parallel prefix computations. 

Theorem 5 (Ladner and Fischer) For any associative binary operation o, the mapping 

(Xl, X2, ... , Xk) ^ {Xi, Xl O X2, . . . , O X2 O • • • O Xfc) (8) 

can he computed by a circuit consisting of (x, y) {x, x o y) gates that has size 0{k) and depth 
0{\ogk). 

Another tool is the so-called three-two adder, which is a circuit that takes three n-bit integers 
X, y, z as input and produces two n-bit integers s, c as output, such that x + y + z = s + c (recall that 
addition is in modulo 2" arithmetic). It is remarkable that a three- two adder can be implemented 
with constant depth and size 0{n). By combining two three- two adders, one can implement a size 
0{n) and depth 0(1) four-two adder, that performs the mapping {x,y,z,w) i-^ {x,y,s,c), where 
x-\-y-\-z-\-w = s-\-c. Now, consider the pairwise representation of each n-bit integer z as a pair 
of two n-bit integers {z' , z") such that z = z' + z" . This representation is not unique, but it is easy 
to convert to and from the pairwise representation: the respective mappings are z i— > (-2,0") and 
{z' ,z") ^ z' + z" . The useful observation is that the four-two adder performs integer addition in 
the pairwise representation scheme, and it does so in constant depth and size 0(n). 

Now, the following procedure computes prefix addition in size 0{kn) and depth 0(logA; -|- 
logn) = 0(log(A;n)). The input is (xi, X2, . . . , Xfc). 

1. Convert the k integers into their pairwise representation. 

2. Apply the parallel prefix circuit of Theorem ^ to perform the prefix additions in the pairwise 
representation scheme. 

3. Convert the k integers from their pairwise representation to their standard form. 

The output will be (xi, xi -|- X2, . . . , xi -|- X2 -|- • • • + x^), as required. 

Note that step 4 of the main algorithm has a circuit of identical size and depth to the one just 
described, as it is simply its inverse. 

3.3 Estimating the phase of a Fourier state in parallel 

Finally, we will discuss the third step of the main algorithm, which corresponds to the mapping 

IV'x)IV'x) • • • \i^x)\x) ^ |V'x>IV'x> • • • IV'x>|o) (9) 
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for X G {0,1}". The number of copies of {ipx) required for this step depends on the error bound 
e; we will require k £ 0(log(n/e)) copies. As discussed in subsection 3T, any Fourier basis state 
\ipx) may be decomposed as {ipx) = 1/^x2-1)1^x2-2) ■ ■ ■ \t^x2-")- Thus, we may assume that we have 
k copies of each of the states |/ia;2-j )- 

First, for each j = 1, . . . ,n, the circuit will simulate measurements of the k copies of \f^x2~^) 
(in the bases discussed below) in order to obtain an approximation Ij/A to the fractional part of 
2~^x. The approximation is with respect to the function | • |i defined as 

\y\i = min{z G [0, 1) : either y — zGZory + zGZ} 

for y G M (i.e., "modulo 1" distance). With high probability the approximations will result in 
h,... ,ln satisfying |/j/4 — 2~^x\i < \ for each j. The (simulated) measurements can be per- 
formed in parallel, and each Ij will be determined by considering the mode of the outcomes of 
the measurements and thus can be computed in parallel as well. Next the circuit will reconstruct 
an approximation x to x (in parallel) from h, . . . ,ln- The circuit then XORs this value of x to 
the register containing x, thereby "erasing" it with high probability. Finally, the circuit inverts 



the computation of this x to clean up any garbage from the computation. As in subsection 3.2 



standard techniques may be used to implement these computations as reversible circuits. We now 
describe each of the above steps in more detail. 

Let us first recall the following fact from probability theory (see, e.g., Goldreich ||l^). If 
Xi , . . . ,Xt are independent Bernoulli trials with probability px of success and Yi , . . . ,Yt are inde- 
pendent Bernoulli trials with pY of success, where px < Py, then 



Pr 



Now, define 



l&o) = ^|0) + ^|l) 



i=l 



l/^o), 



4 

l/is) 

4 



and consider measurements of the states 11^x2-^) the bases {|?)o))|^2)} and {|&i),|^3)} (these 
measurements correspond to measurements of the Pauli operators ax and ay, respectively). In 
particular, given that we have k copies of each 1/^2,2-^)) suppose that each of the above two 
measurements is performed independently on k/2 of the copies. Let Ij G {0,1,2,3} represent the 
basis state that occurs with the highest frequency in these measurements for each j, breaking ties 
arbitrarily. We claim that the inequality \lj/4 — 2~^x\i < ^ is satisfied with high probability: 



Pr - 2-^x\^ >l]< 4e-'^/^ (10) 

To prove that the inequality (|l0|) holds, let us suppose that x and j are fixed, and let us define 



Po = 1(^01/^x2-^)1^ 



Pi 



K^l 1/^x2-^ ) P> P2 



l(&2|/^x2-:>)P> 



P3 = 1(^31/^x2-^)1' 



These are the probabilities associated with the above measurements, meaning that the probability 
that a measurement of |/tx2-j) in the {|6o),|62)} basis yields is po, the probability that the 
measurement yields 2 is and similar for pi and ps when the measurement in the {|&i), I&3)} basis 
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is performed. Now, note the following two facts: (i) it must be that max{pQ,pi,p2,P3} > l/2+-v/2/4 
(for any choice of x and j), and (ii) if |//4 — 2~^x\i > j, then we must have pi < 1/2. Therefore, 
if the inequality is not satisfied for some j (i.e., if |/j/4 — 2~^x\i > |), then it must be the case 
that pii — pi- > \/2/4 for some different value of I' 7^ Ij. Based on the inequalities above, we 
conclude that a very improbable event has taken place: the probability of the result Ij appearing 
more frequently than I' is at most 2e~^l^ . Unless |2~''x|i € {0, ^, ^, |} there are at most two values 
of that give |/j/4 — 2~^x\\ > ^, and so in this case we conclude that (|lC|) holds. (In the special 
case |2~-'x|i G {0, |, ^, |}, the inequality ( |lO|) follows trivially.) 

From (^) we determine that |/j/4 — 2~^x\i < j holds for all values of j with probability at 
least 1 — 4ne~'^/^. 

Now consider the following problem: 



Input: /i,... G {0,1,2,3}. 

Promise: There exists x G {0, 1}" such that |/j/4 — 2^-'3;|^ < ^ for j = 1, . . . , n. 
Output: X satisfying the promise. 



The following algorithm solves this problem: 

1. Define 

2. Let Xj = Ai.Ai._-^ ■ ■ ■ Ai-^ [2, 1] for each j, and output x = x„, • • • xi. 

Let us now demonstrate that the algorithm is correct. We note that it is straightforward to show 
that for a given input h, . . . ,ln there is at most one x satisfying the promise, and thus the solution 
is uniquely determined if the promise holds. To show that the algorithm computes this x correctly, 
we prove by induction on j that xj is output correctly. The set {Aq, Ai, A2, A^} is closed under 
matrix multiplication, so we must have that the first column Ai. ■ ■ ■ Ai-^ is either 

for each i. Thus it suffices to prove that the first column of vl; . • • • Ai-^ is ei if xj = and is 62 if 
Xj = 1. The base case is j = 1. Either xi = 0, in which case the fractional part of 2~^x is 0, or 
xi = 1, in which case the fractional part of 2~^x is 1/2. By the promise, we must therefore have 
h £ {0, 1} in case xi = and li G {2, 3} in case xi = 1. Thus the first column of Ai^ is ei if xi = 
and is 62 if xi = 1 as required. Now suppose Xj, . . . , xi are output correctly. We want to show that 
the first column of Ai_^_^_^ ■ ■ ■ Ai^ is ei if Xj+i = and is 62 if Xj+i = 1. There are four possibilities 
for the pair (xj+i,Xj) that, along with the promise, give rise to the following implications: 

=> Ij+i G {0, 1} 

1 => G {1,2} 

^ Ij+i G {2, 3} 

1 ^ /j+i G {3,0} 




Xj+i = 0, Xj = 
Xj+i = 0, Xj = 

Xj-|_l — 1, Xj — 
Xj-|_l — 1, Xj — 
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Suppose Xj = 0, implying that the first column of Ai^ ■ ■ ■ Ai-^ is ei . If Xj+i = then either Ij^i = or 
= 1, in either case implying that the first column of Ai.^^ ■ ■ ■ Ai-^ is ei, as required. Similarly, 
if Xj+i = 1 then either Ij^i = 2 or Ij^i = 3, in either case implying that the first column of 
A +1 ■ ■ ■ is 62, as required. The case xj = 1 is similar. Thus we have shown that the algorithm 
operates correctly. 

The above algorithm lends itself well to parallelization, following from the parallel prefix method 
discussed in subsection 3^; by Theorem |5| all values of Xj = Ai.Ai_._^ ■ ■ ■ Ai-^ [2, 1], j = 1, . . . , n can 
be computed by a single circuit of size 0{n) and depth O(logn) (following from the fact that 
multiplication of the 2x2 matrices, in modulo 2 arithmetic, can of course be done by constant-size 
circuits) . 

It follows that the entire circuit for approximating the mapping (P) given k G 0(log(n/e)) 
copies of Itpx) has size 0(nlog(n/e:)) and depth 0(log n + loglog(n/e)) = 0(logn-|-loglog(l/e)). It 
remains to argue that the circuit operates with error 0{e). This follows from standard results based 
on ideas in Q about converting quantum circuits that perform measurements and produce classical 
information with small error probability into unitary operations (without measurements) that can 
operate on data in superposition. It should be noted that a state \^lJx) can be conserved throughout 
the computation to ensure that errors corresponding to different values of x are orthogonal. 



4 New size bounds for the QFT 

In this section, we prove Theorem 2. Let denote the Fourier transform modulo 2", which acts 
on n qubits. The Hadamard transform is H = F2. 

The standard circuit construction for i^2" can be described recursively as follows (where the 
two-qubit controlled-phase shift gates of the form c-P{d) are defined in Section |2|). 

Standard recursive circuit description for F2n: 

1. Apply F2n-i to the first n — 1 qubits. 

2. For each j e {1, 2, . . . , n - 1}, apply c-P(l/2"~J+i) to the j^^ and n*^ qubit. 

3. Apply H to the n^^ qubit. 

The resulting circuit consists of n(n — l)/2 two-qubit gates and n one-qubit gates. 

Below is a more general recursive circuit description for ^2" , parameterized bymGjl,... ,n— 1}. 
This coincides with the above circuit when m = \. When m > 1, it can be verified that the circuit 
does not change very much. It has exactly the same gates, though the relative order of the two-qubit 
gates (which all commute with each other) changes. 

Generalized recursive circuit description for ¥2^: 

1. Apply F2n-m to the first n — m qubits. 

2. For each j G {1,2,... , n — m} and k € {1,2,... , m}, apply c-P(l/2^~-'+^) to the j^^ and 
{n — m + kY^ qubit. 

3. Apply F2"i to the last m qubits. 
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Our new quantum circuits are based on this generahzed recursive construction with m = [n/2j , 
except that they use a more efficient method for performing the transformation in Step 2. As is, 
Step 2 consists of (n — m)m two-qubit gates, which is approximately The key observation is 

that Step 2 computes the mapping which, for x S {0, l}""™- and y £ {0, 1}™", takes the state \x)\y) 
to the state (e^'^*/^")^'^|a;)|y), where x ■ y denotes the product of x and y interpreted as binary 
integers. From this, it can be shown that Step 2 can be computed using any classical method 
for integer multiplication in conjunction with some one-qubit phase shift gates (of the form P{0), 
defined in Section |2t). 



The best asymptotic circuit size for integer multiplication, due to Schonhage and Strassen |l33| , 
is O(nlognloglogn), which can be translated into a reversible computation of the same size that 
we will denote as 5. For x G {0, and y G {0, 1}™, S maps the state |x)|y)|0") to • y). 

(There are 0{n) additional ancilla qubits that are not explicitly indicated. Each of these begins 
and ends in state |0).) 

Improved Step 2 in general circuit description for F2": 

1. Apply S to the 2n qubits. 

2. For each k £ {1,2,... ,n} apply P(l/2'=) to the (n + kf^ qubit. 

3. Apply to the 2n qubits. 

Using this improved Step 2 in the generalized recursive circuit description for results in a 
total number of gates that satisfies the recurrence 

Tn = T^n/2-] +^K2J + C>(n log n log log n) , (11) 

which implies that T„ G 0(n(logn)^ loglogn). It is straightforward to also show that the circuit 
has depth 0(n) and width 0{n) (where ancilla qubits are counted as part of the width). 



5 Factoring via logarithmic- depth quantum circuits 

In this section we discuss a simple modification of Shor's factoring algorithm that factors integers in 
polynomial time using logarithmic-depth quantum circuits. It is important to note that we are not 
claiming the existence of logarithmic-depth quantum circuits that take as input some integer n and 
output a non-trivial factor of n with high probability — the method will require (polynomial time) 
classical pre-processing and post-processing that is not known to be parallelizable. The motivation 
for this approach is that, under the assumption that quantum computers can be build, one may 
reasonably expect that quantum computation will be expensive while classical computation will be 
inexpensive. 

The main bottleneck of the quantum portion of Shor's factoring algorithm is the modular 
exponentiation. Whether or not modular exponentiation can be parallelized is a long-standing 
open question, and we do not address this question here. Instead, we show that sufficient classical 
pre-processing allows parallelization of the part of the quantum circuit associated with the modular 
exponentiation. Combined with logarithmic-depth circuits for quantum Fourier transform, we 
obtain the result claimed in Theorem ^. 
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In order to describe our method, let us briefly review Shor's factoring algorithm, including the 
reduction from factoring to order-finding. It is assumed the input is a n-bit integer A'^ that is odd 
and composite. 

1. (Classical) Randomly select a G {2, . . . , — 1}. If gcd(a, A^) > 1 then output gcd(a,A'"), 
otherwise continue to step 2. 

2. (Quantum) Attempt to find information about the order of a in Zat: 

a. Initialize a 2n-qubit register and an n-qubit register to state |0)|0). 

b. Perform a Hadamard transform on each qubit of the first register. 

c. (Modular exponentiation step.) Perform the unitary mapping: 

|x)|0) 1-^ \x)\a^ mod A^). 

c. Perform the quantum Fourier transform on the first register and measure (in the computa- 
tional basis). Let y denote the result. 

3. (Classical) Use the continued fraction algorithm to find relatively prime integers k and r such 
that < A; < r < A^ and |y/2™ - k/r\ < 2"^"-. If a** = 1 (mod A^) then continue to step 4, 
otherwise repeat step 2. 

4. (Classical) If r is even, compute d = gcd(a^/^ — 1, A^) and output d if it is a nontrivial factor of 
A^. Otherwise go to step 1. 

The key observation is that much of the work required for the modular exponentiation step can 
be shifted to the classical computation in step 1 of the procedure. In step 1, the powers bo = a, 
bi = a} mod A^, 62 = mod N, . . . , 62™- 1 = mod A^ can be computed in polynomial-time. 

With this information available in step 2, the modular exponentiation step reduces to applying a uni- 
tary operation that maps |6o>|^i> • • • |62n-i)|2;)|0) to |6o)|^i) • • • |^2n-i)k)|6g° • b^^ ■ • • ftgn-Y ™od A^). 
This is essentially an iterated multiplication problem, where one is given 2n n-bit integers 
bQ°,bi^,... , &2n-Y input and the goal is to compute their product. The most straightforward 
way to do this is to perform pairwise multiplications following the structure of a binary tree with 2n 
leaves. Each multiplication can be performed with depth O(logn) and size 0{n?). The underlying 
binary tree has depth log(2n) and 2n — 1 internal nodes. Thus, the entire process can be performed 
with depth 0((logn)^) and size O(n^). 

There are alternative methods for performing iterated multiplication achieving various combina- 
tions of depth and size. In particular, it was proved by Beame, Cook and Hoover Q that a product 
such as we have above can be computed by O(logn) depth boolean circuits of size 0(n^(log n)^). 
While 0(n^ log n) qubits may seem a high price to pay in order to save a factor of O(logn) in 
the circuit depth, the result has an interesting consequence regarding simulations of logarithmic- 
depth quantum circuits: if logarithmic-depth quantum circuits can be simulated in polynomial 
time, then factoring can be done in polynomial time as well. It should be noted that the circuits 
of Beame, Cook and Hoover are not logspace-uniform but rather are polynomial-time uniform; the 
best known bound on circuit depth for iterated products in the case of logspace uniform circuits is 



O(lognloglogn) due to Reif |31] 
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6 Lower bounds 



Logarithmic-depth lower bounds for exact computations with two-qubit gates are fairly easy to 
obtain, based on the fact that the state of some output qubit (usually) critically depends on every 
input qubit. Since, by Eq. |, the last qubit of |^x„_i...xixo> is in state -ij(|o) + e2'^*(°-^"-i-^i^o)|l)), 
its value depends on all n input qubits to the QFT when its input state is \xn-i ■ ■ ■ xiXq). The 
depth of the circuit must be at least log n for this to be possible. This lower bound proof applies not 
only to the QFT, but also to QFS computations (which are defined in Section ^. This is because 
the output of a QFS on input |x)|0) includes the state \tpx)- 

On the other hand, approximate computations can sometimes be performed with much lower 
depth than their exact counterparts. For example, in Section it is shown that a QFS can 
be computed with precision e by a quantum circuit with depth 0(loglog(n/e)). Note that this 
is O(loglogn) whenever e G l/n'^^^\ Although this suggests that it is conceivable for a sub- 
logarithmic-depth circuit to approximate the QFT with precision l/n^^^\ Theorem ^ implies that 
this is not possible. We now prove this theorem. 

Let C be a quantum circuit that approximates the inverse QFT with precision j^. In this 
section, since we will need to consider distances between mixed states, we adopt the trace distance 
as a measure of distance (see, e.g., [^]). The trace distance between two states with respective 
density operators p and a is given as 

D{p,a) = ^TT\p-al (12) 

where, for an operator A, \A\ = V A^A. For a pair of pure states |0) and | (/>'), their trace distance 
is y^l — which is upper bounded by their Euclidean distance. 

On input \ipx„-i...xixo) , the output state of C contains an approximation of \xn-i ■ ■ ■ xiXq). In 
particular, one of the output qubits of C should be in a state that is an approximation of 
within jq. Let us refer to this as the high- order output qubit of C. If the depth of C is less 
than logn then the high-order output qubit of C cannot depend on all n of its input qubits. Let 
A; G {0, 1, . . . , n — 1} be such that the high-order output qubit does not depend on the k^^ input 
qubit (where we index the input qubits right to left starting from 0). Let r = n — k — 1. 

Set z = 2" — 1, which is 11 ... 1 = 1" in binary. Following Eq. ^, \ipz) can be written as 

l-fpz) = |Aio.i)lMo.ii) • • • l^o.i'O- (13) 
Consider the state \^z+2r). Since z + 2'' = Q-^-n' (mod 2"), 

IV'2+2'-) = |/J.0.l)|/W0.1l) • • • |/^0.1'-)|/^0.01'-) 1/^0.001'-) • • • l^o.O"-'-!'-)- (14) 

Note that, on input \ipz), the high-order output qubit of C approximates |1) with precision 
whereas, on input \il)z+2'~), the high-order output qubit of C approximates |0) with precision j^. 

Now, we consider a state {ipz), which has an interesting relationship with both {ipz) and \tpz+2^)- 
Define 

li^'z) = lAto.i)lA^o.ii) • • • |^o.i'-)|^o.oi'-)|^o.i'-+2)lAto.i'-+3) • • • lAto.1")- (15) 

The states and jV'z) are identical, except in their /c*^ qubit positions (which are orthogonal: 
l/^o.oi'') vs. Ip-o.ir+i)). Since the high-order output qubit of C does not depend on its k^^ input 
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qubit, it is the same for input as for input \'4^z)- Therefore, the state of the high-order output 
qubit of C on input IV'^) is within ^ of |1). 

On the other hand, the trace distance between 1-0^) and |02+2' ) can be calculated to be below 
0.7712, as follows. The two states are identical in qubit positions n — 1, n — 2, . . . , /c. In qubit 
position k — 1, the two states differ by an angle of ^, in qubit position k — 2 the two states differ 
by an angle of |, and so on. Therefore, 

{tp'z\tpz+2^) = (/^0.1'-+2|/^0.001'-)(/^0.1'-+3|^0.0001'-) • • • (/^0.1"|/^0.0"-'-l'-) 

= cos(^) cos(l^) • • • cos( ^„_\_i ) 

> cos(^) cos(^) cos(^) • • • 

> 0.6366, 

where the numerical value for the last inequality is proved in Lemma |^ (below). This implies that 
the trace distance between {ip'^) and \ipz+2^) is less than \/l — (0.6366)^ = 0.7712. Since the trace 
distance is contractive, it follows that the state of the high-order output of C on input {ip'^) has 
trace distance less than 0.7712 from the state of high-order output of C on input \ipz+2^)- But, 
by the triangle inequality, this implies that the trace distance between |0) and |1) is less than 
3^ + 0.7712+ < 1, which is a contradiction, since |0) and |1) are orthogonal. This completes the 
proof of Theorem ^. 

Lemma 6 cos(^) cos(^) cos(^) • • • > 0.6366. 

Proof: We first lower bound the tails of the above infinite product by showing that, for any i > 1, 
cos(2^tt) cos{~^) cos{^) • • • > 1 - Since, for t > 0, cos(t) > 1 - 



C0S(2^) COs(25T^) COs(2iT:T) " " " 




Now it follows that, for any i> I, cos(^) cos(^) cos(^) • • • > cos(^) • • • cos(|^)(l — ^)- Setting 
i = 8 in this inequality gives the numerical lower bound. I 



7 Other moduli 

In this section we discuss the quantum Fourier transform with respect to moduli that are not 
powers of 2. First we briefly sketch a method for performing (in parallel) the QFT for an arbitrary 
modulus that uses the QFT with a power of 2 modulus as a black box. We then discuss Shor's 
original method for performing the QFT with respect to a "smooth" modulus, and mention how 
this method may be parallelized as well. 

7.1 Arbitrary moduli 

Consider the QFT with respect to an arbitrary modulus m. In this subsection we note that it is 
possible to approximate such a QFT with high accuracy in parallel using circuits for the quantum 
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Fourier transform modulo 2^^ for k = [logmj + 0(1). Using the circuits for the quantum Fourier 
transform modulo 2^^ described previously, we have that for any e and modulus m there exists 
a depth 0(log?T.loglog(l/e)) quantum circuit that approximates the QFT modulo m to within 
accuracy e for n = [logm]. The size of the circuit is polynomial in n + log(l/e). 

The method exploits a relation between QFTs with different moduli that was used by Hales 
and Hallgren [^] in regard to the Fourier Sampling problem (see also H0yer |24] for an extension 
and simplified proof). 

The basic components of the technique follows: 

1. Create a Fourier state with modulus m, which is the mapping 

|x)|0) ^ 

2. Copy the Fourier state, which is the mapping 

|x)|Va.)|0)---|0) ^ \x)\^P,)\^P,)---\^P,). 

3. Apply the inverse Fourier transform modulo 2^ on each state {ipx), which is the mapping 

^ |x)(fJ,|Vx>)---(4|^.>). 

4. For each (computational basis state) y occurring among the collections of qubits on which 

was performed, compute round(y m 2"^^) mod m, and compute the mode of these results. 
With high probability the result will be x. (A reasonably straightforward calculation shows 
that observation of F^^.\il)x) in the computational basis yields some y with round(ym2~'^) = x 
with probability greater than 1/2 + 6 for some constant 5.) XOR this result to the qubits in 
state \x), and reverse the computation of each round(y m 2"*^) and y. With high probability 
the mapping 



\x) [f^m) ■ ■ ■ (4.|Vx>) ^ |o) [fIm) ■ ■ ■ 

has been performed. 



5. Reverse steps 3 and 2, giving the mapping |0) (^FjjVx)j • • • yF^k\ipx)j ^ |0)|V'a.)|0) • • • |0). 

Unfortunately some of the methods used in the power of 2 case (such as using three-two adders 
and approximating the individual qubits of the Fourier basis states) do not seem to work in this 
case, which results in the slightly worse depth bound. The overall size bound increases as well, but 
is still polynomial. 

It is interesting to note that this method does not require the larger modulus to be a power 
of 2 — effectively the method shows that the QFT modulo m for any modulus m can be efficiently 
approximated given a black box that approximates the QFT modulo m' for any sufficiently large 
m' . The technical details regarding this method will appear in the final version of this paper. 
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7.2 Shor's "mixed-radix" QFT 



We conclude with a brief discussion of Shor's original "mixed radix" method for computing the 
quantum Fourier transform, as it too can be parallelized (although to our knowledge not as effi- 
ciently as the power-of-2 case discussed previously in this paper). 

Shor's original method for computing the QFT is based on the Chinese Remainder Theorem 
and its consequences regarding TLm for given modulus m. Here the modulus is m = m\m2 ■ ■ ■ rrik 
for mi,... ,nik pairwise relatively prime and mj E O(logm). Thus k G 0(logm/loglogm-) is 
somewhat less than the number of bits of m, and each mj has length logarithmic in the length of 
m. Taking rrij to be the j^^ prime results in a sufficiently dense collection of moduli m for factoring 
IQ (see Rosser and Schoenfeld |Q for explicit bounds and a detailed analysis of such bounds). 

Although stated somewhat differently by Shor, the mixed radix QFT method may be described 
as follows: 

1. For j = 1, . . . ,k define fj = ^ and set gj G {0, . . . ,rnj — 1} such that gj = (mod rrij). 

2. Define C to be the (reversible) operator acting as follows for each x € {0, ... , m — 1}: 

C : |x) I— > |(x mod mi), ... , (x mod m^)) 

3. Define ^ to be a (reversible) operator such that 

A : \xi, ... ,Xk) ^ \9ixi, ... , gkXk) 
for each (xi, . . . , Xk) G {0, . . . , mi — 1} x • • • x {0, . . . , ruk — 1}. 

4. Let Fm and Fm^ denote the QFT for moduli m and mj, j = 1, . . . , fe, respectively. Then the 
following relation holds: 

F^ = C\F^^®---®Fm^)AC. (16) 

Thus, to perform the QFT modulo m on \x), first convert x to its modular representation 
(xi, . . . ,x/c) using the operator C, multiply each Xj by gj (modulo m^), perform the QFT 
modulo ruj independently on coefficient j (for each j), then apply the inverse of C to convert 
back to the ordinary representation of elements in {0, ... , m — 1}. 

The numbers computed in step 1 are used in the standard proof of the Chinese Remainder The- 
orem: given xi, . . . , x^, we may compute x S {0, . . . , m — 1} satisfying x = Xj (mod ruj) for each 
j by taking x = ^j=i fjdj^j mod m. Thus the operator C can be implemented efficiently, since 
the mappings x i-^ ((x mod mi), ... , (x mod m^)) and ((x mod mi), ... , (x mod m/j)) i-^ x are 
efficiently computable (e.g., with size O(log^m) circuits |2|). In the present case C can be paral- 
lelized to logarithmic depth, since each of the moduli are small. Similarly, the operator A can be 
parallelized to logarithmic depth. 
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To see that the relation (16) holds, we may simply examine the action of the operator on the 



right hand side on computational basis states: 

C\Fm,®---®Fm,)AC\x) 

= C\Fm^ ^ ■ ■ ■ ^ Fm^)\gixi, . . . ,gkXk) 

1=^^ exp(27ri5riXiyi/mi)---exp(27rigfca;fcyfc/mA:)|2/i,--- 



m 

yi,--- ,yk 



^=C"^ exp(27ri(/i5fi2;iyi H h fkgkXkyk)/'m)\yi, . . . ,yk) 

^ yi,---,yk 

~1= yZ exp(27ri(/i5-ixiyi H h fk9kXkyk)/m)\figiyi H h fkOkVk (mod m)) 

V'exp(27rixy/m)|y) 

, /m. 



m 

y 



— Fm\x) 

Finally, each of the independent QFTs modulo m-i, . . . ,mk can of course be done in parallel. 
Here, however, a problem arises if our goal is to parallelize the entire process. Originally Shor sug- 
gests implementing each of these operations by circuits of size vrij (not logmj), since any quantum 
operation can be computed by circuits with exponential-size quantum circuits This results in 
a linear-depth circuit overall, although the circuit will be exact. 

However, we may try to compute each Fm^ more efficiently. There are a few possibilities for 
how to do this, all (apparently) requiring approximations of each F^y First, we may apply the 
method of Kitaev [p^] to approximate these QFTs. Alternately, we may use the arbitrary modulus 



method we have proposed in section 7.1. Finally, we have noted that this method works for any 
two moduli (not just for the larger modulus a power of 2) so that we may in fact recurse using the 
mixed-radix method to approximate each F^j ■ 

In all cases, our analysis has revealed that the mixed radix method results in worse size and/or 
depth bounds than the power of 2 method presented in Section 0. 



8 Conclusion 

We have proved several new bounds on the circuit complexity of approximating the quantum Fourier 
transform, and have applied these bounds to the problem of factoring using quantum circuits. There 
are several related open questions, a few of which we will now discuss. 

First, is it possible to perform the quantum Fourier transform exactly using logarithmic- or 
poly-logarithmic-depth quantum circuits? The best currently known upper bound on the depth of 
the exact QFT is linear in the number of input qubits. 

Next, can the efficiency of our techniques be improved significantly? We have concentrated on 
asymptotic analyses of our circuits, and we believe it is certain that our circuits can be optimized 
significantly for "interesting" input sizes (perhaps several hundred to a few thousand qubits). 

Finally, the fact that the quantum Fourier transform can be performed in logarithmic depth 
suggests the following question: are there interesting natural problems in BQNC (bounded-error 
quantum NC) not known to be in NC or RNC? For instance, computing the gcd of two n-bit 
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integers and computing a mod c and a~ mod c for n-bit integers a, b, and c is not known to 
be possible using polynomial-size circuits with depth poly-logarithmic in n in the classical setting. 
Are there logarithmic- or poly-logarithmic-depth quantum circuits for these problems? Greenlaw, 
Hoover and Ruzzo list several other problems not known to be classically parallelizable, all of 
which are interesting problems to consider in the quantum setting. 
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